Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition

نویسندگان

  • Bhavik Vachhani
  • Chitralekha Bhat
  • Biswajit Das
  • Sunil Kumar Kopparapu
چکیده

Dysarthria is a motor speech disorder, resulting in mumbled, slurred or slow speech that is generally difficult to understand by both humans and machines. Traditional Automatic Speech Recognizers (ASR) perform poorly on dysarthric speech recognition tasks. In this paper, we propose the use of deep autoencoders to enhance the Mel Frequency Cepstral Coefficients (MFCC) based features in order to improve dysarthric speech recognition. Speech from healthy control speakers is used to train an autoencoder which is in turn used to obtain improved feature representation for dysarthric speech. Additionally, we analyze the use of severity based tempo adaptation followed by autoencoder based speech feature enhancement. All evaluations were carried out on Universal Access dysarthric speech corpus. An overall absolute improvement of 16% was achieved using tempo adaptation followed by autoencoder based speech front end representation for DNN-HMM based dysarthric speech recognition.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dysarthric Speech Recognition and Offline Handwriting Recognition using Deep Neural Networks

Dysarthric Speech Recognition and Offline Handwriting Recognition using Deep Neural Networks Suhas Pillai, M.S. Rochester Institute of Technology, 2017 Supervisor: Dr. Raymond Ptucha Millions of people around the world are diagnosed with neurological disorders like Parkinsons, Cerebral Palsy or Amyotrophic Lateral Sclerosis. Due to the neurological damage as the disease progresses, the person s...

متن کامل

An Automatic Dysarthric Speech Recognition Approach using Deep Neural Networks

Transcribing dysarthric speech into text is still a challenging problem for the state-of-the-art techniques or commercially available speech recognition systems. Improving the accuracy of dysarthric speech recognition, this paper adopts Deep Belief Neural Networks (DBNs) to model the distribution of dysarthric speech signal. A continuous dysarthric speech recognition system is produced, in whic...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Automatic dysfluency detection in dysarthric speech using deep belief networks

Dysarthria is a speech disorder caused by difficulties in controlling muscles, such as the tongue and lips, that are needed to produce speech. These differences in motor skills cause speech to be slurred, mumbled, and spoken relatively slowly, and can also increase the likelihood of dysfluency. This includes nonspeech sounds, and ‘stuttering’, defined here as a disruption in the fluency of spee...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017